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Abstract. The nonlinear filter associated with the discrete time signal-observation model 
{Xk,Yk) is known to forget its initial condition as fc — > oo regardless of the observation 
structure when the signal possesses sufficiently strong ergodic properties. Conversely, it 
stands to reason that if the observations are sufficiently informative, then the nonlinear filter 
should forget its initial condition regardless of any properties of the signal. We show that for 
observations of additive type Yk = h{Xk) + £,k with invertible observation function h (under 
mild regularity assumptions on h and on the distribution of the noise ^k ) , the filter is indeed 
stable in a weak sense without any assumptions at all on the signal process. If the signal 
satisfies a uniform continuity assumption, weak stability can be strengthened to stability in 
total variation. 



1. Introduction 

Let {E,'B{E)) and (F, 'B(F)) be Polish spaces endowed with their Borel cr-fields, and let 
P : E X "B^E) [0, 1] be a given transition probability kernel. On the sequence space 
$7 = E^+ X with the canonical coordinate projections Xn{x,y) = x{n), Yn{x,y) = y{n), 
we define the family of probability measures (for any probability measure /x on E) such 
that {Xn)n>o is a Markov chain with initial measure Xq ~ and transition probability P, 
and such that Yn = H{Xn,(,n) for every n > where is an i.i.d. sequence independent of 
{Xn)n>o- A time series model of this type, called a hidden Markov model, has a wide variety of 
applications in science, engineering, statistics and finance; see, e.g., |^]. The process {Xn)n>o 
is known as the signal process (and E is the signal state space), while {Yn)n>o is called the 
observation process (and F is the observation state space). 

As E is Polish, we may define for every fi the regular conditional probabilities 

<_(•) :=P''(^n G ■\Yo,...,Yn-i), n> 1, 

and 

7T^{-):=P^'{Xne ■\Yo,...,Yn), n > 0. 

Here 7r^_ is called the one step predictor of the signal given the observations, while vr^ is 
known as the nonlinear filter. These objects play a central role in the statistical theory of 
hidden Markov models. A question which has generated considerable interest in recent years 
is whether, as n ^ oo, the filter vr^ becomes insensitive to the choice of the initial measure 

Broadly speaking, the filter is said to be stable if tt^ and vrj^ converge towards one another 
in a suitably chosen manner as n ^ oo (e.g., \\-Kn — ttJ^Htv P'^-a.s.) for a large class of 
initial measures /i, u (e.g., for all // <C ;/). 

The filter stability property is of significant practical interest, as the initial measure (a 
Bayesian prior) may be difficult to characterize. When the filter is stable, we can guarantee 
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that it will nonetheless generate optimal estimates of the signal process after an initial tran- 
sient. Moreover, the stability property also plays a key role in various important auxiliary 
problems, such as proving consistency of maximum likelihood estimates and proving uniform 
convergence of approximate filtering algorithms. On the other hand, the filter stability prob- 
lem poses a set of interesting mathematical questions in the theory of nonlinear estimation, 
many of which have yet to be fully resolved. An overview of the state-of-the-art can be found 
ini. 

Intuitively one expects filter stability to be caused by two separate mechanisms: 

(1) If the signal process itself becomes insensitive to its initial condition after a long time 
interval (i.e., the signal is ergodic) one would expect the filter to inherit this property. 

(2) If the observations are informative, one would expect that the information in the 
observations will eventually obsolete the prior information contained in the initial 
measure. 

In the two special cases where a detailed characterization of filter stability is available — for 



linear Gaussian models [12| and for finite state signals |17| — the notion of detectability em- 
bodies precisely this intuition. It thus seems reasonable to conjecture that it is true in great 
generality that these two mechanisms conspire to bring about the stability of the filter. To 
date, however, detectability conditions for filter stability are only known in the abovemen- 
tioned special cases. To gain further insight, it is therefore instructive to study each of the 
mechanisms separately in a general setting. In particular, the two extreme cases lead to the 
following fundamental problems: (i) can we find conditions on the signal process such that the 
filter is stable regardless of the observation structure? and (ii) can we find conditions on the 
observation structure such that the filter is stable regardless of any properties of the signal? 

Various solutions to Problem (i) can be found in the literature. It was shown by Atar 
and Zeitouni Q and by Del Moral and Guionnet that the filter is stable whenever the 
signal possesses a certain strong mixing condition, regardless of the observation structure. 
The mixing condition was weakened to some extent by Chigansky and Liptser Somewhat 
surprisingly, assuming only ergodicity of the signal is not sufficient to guarantee stability 
(see |5|, section 5] ) ; both the mixing condition and the condition of Chigansky and Liptser are 
strictly stronger than ergodicity. Under a mild nondegeneracy assumption on the observations, 



however, ergodicity of the signal is already sufficient to ensure stability of the filter |15]. 

In contrast to the first problem, a solution to Problem (ii) has hitherto been elusive. Unlike 
results based on ergodicity or mixing, stability results based on the structure of the observa- 



tions have appeared only recently in [^, 17, 16|. It appears, however, that it is more natural 



in this context to study stability of the predictor than stability of the filter. In particular, the 



following general result was established in |16, proposition 3.11] for additive observations of 



the form Yn = h{Xn) + where E = F = R", h : M" — > M" is a given observation function, 
and ^„ is a sequence of i.i.d. ]R"-valued random variables independent of (X„)„>o. 



Proposition 1.1 (|T^). Suppose that the following hold: 

(1) h possesses a uniformly continuous inverse; and 

(2) the characteristic function of vanishes nowhere. 

Then |K„ - <_||bl P^-a.s. whenever P^L{(y,),>o} < Pla{{n),->o} • 

Here || • ||bl denotes the dual bounded-Lipschitz distance (to be defined below). The 
assumptions of this result certainly conform to the idea that the observations are 'informative': 
Yk is simply a noisy and distorted version of Xj. . Note also that this result places no conditions 



DISCRETE TIME NONLINEAR FILTERS WITH INFORMATIVE OBSERVATIONS ARE STABLE 



3 



whatsoever on the signal process except the Markov property. However, the result is a 
statement about the one step predictor and not about the filter. 

The main purpose of this note is to point out that under the mild additional assumption 
that the law of the noise variables has a density, a slightly weaker version of proposition 



1.1 holds also when the predictor is replaced by the filter. We therefore provide an affirmative 
answer to the conjecture that there exists a solution to Problem (ii) above. The proof of this 
result adapts a coupling argument due to Ocone and Pardoux |l^, lemma 3.6]. 

Remark 1.2. In the continuous time setting, it is known that a result along the lines of 



proposition 1.1 holds for the filter when the signal state space E is assumed to be compact 



(under the mild assumption that the signal is Feller), see |17|. This is not a satisfactory 



solution to Problem (ii), however, as unstable signals are ruled out in a compact state space. 

Even if one is willing to make assumptions on the signal process, the case of non-ergodic 
signals has received comparatively little attention in the literature. Previous results in the 
non-ergodic setting show that the filter is stable in the total variation distance, but only under 
strong assumptions on both the signal and the observation process [l^, In particular, 
these results only hold when the signal to noise ratio of the observations is sufficiently large 
(this appears to be an inherent restriction of the method of proof used in these papers). 
In addition to our main result, we will show that the filter is stable in the total variation 
distance under significantly weaker assumptions than have been required in previous work. 
In particular, our result holds for an arbitrary signal to noise ratio. To this end, we must 
investigate when the convergence in the dual bounded-Lipschitz distance in our main result 
can be strengthened to convergence in the total variation distance. For this purpose we will 
introduce a suitable uniform continuity assumption on the signal transition kernel. 



2. Notation and Main Results 



2.1. Notation. For any Polish space S endowed with a complete metric d (when S 
will always choose the Euclidean metric d{x,y) = \\x — y\\), define 



we 



sup|/(x)|. 



L = sup ■ 



\f ix)-f{y)\ 

d{x,y) 



for any / : S 



If < oo, the function / is Lipschitz continuous. Denote by Lip = {/ : 5" — > M : 

||/||cx> ^ 1 and ||/||l ^ 1} the unit ball in the space of 1-Lipschitz functions. Then for any 
two probability measures /i, v on S, the dual bounded-Lipschitz norm is defined as 



\^J' - i^IIbl = sup 

/6Lip 



f{x)n{dx)-j f{x)v{dx) 



The supremum can equivalently be taken over a countable subfamily Lipg C Lip (Lipg does 
not depend on [16, lemma A.l]. As usual, the total variation norm is defined as 



I// - i^IItv 



sup 



3<1 



f{x)n{dx) - / f{x)u{dx) 



Also in this case the supremum can be replaced by the supremum over a countable subfamily 
BoC{f : ll/lloo < 1} (along the lines of [||, lemma 4.1]). 
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2.2. Main Results. In the following, we will work with a hidden Markov model where the 
signal state space E is a Polish space with complete metric d, and the observation state space 
is Euclidean F = M". We consider additive observations of the form = h{Xk) + .^^ for all 
k > 0, where h : E ^ is the observation function and is a sequence of i.i.d. M"-valued 
random variables which are independent of the signal. The signal transition kernel P is fixed 
at the outset and is not presumed to satisfy any assumptions until further notice. 
In our main result, we will impose the following assumption. 

Assumption 2.1. The following hold: 

(1) The observation function h possesses a uniformly continuous inverse. 

(2) The law of has a density with respect to the Lebesgue measure on M". 

(3) The Fourier transform of vanishes nowhere. 

Note that this is an assumption on the observations only: nothing at all is assumed about 
the signal at this point. Our main result thus holds regardless of any properties of the signal. 



Theorem 2.2. Suppose that assumption 2A holds. Then 

E'^dK - <IIbl) whenever 'P^\a{{Y,%^_,} « P"U{(n)fc>o}- 

In order to strengthen the convergence to the total variation distance, we do need to impose 
an assumption on the signal. The following is essentially a uniform strong Feller assumption. 

Assumption 2.3. The signal transition kernel P satisfies 

||P(x„, •) - P{yn, OIItv "^°"> whenever d{xn,yn) "^°°> 0. 

In other words, the measure- valued map x i— > P(x,-) is uniformly continuous for the total 
variation distance on the space of probability measures. 

We now obtain the following result. 



Theorem 2.4. Suppose that assumptions 2.1 and \2.3( hold. Then 



<-<\Wy^^^ P^-a.s. whenever P^Lifv, < P'^ 



n n 



A typical example where assumption |2^ holds is the following. 
Proposition 2.5. Let E = M™". Suppose that the signal is defined by the recursion 

Xk+i=b{Xk)+a{Xk)rik, 
where b, a, r]^ satisfy the following assumptions: 

(1) b : M™ — i- and a : M"^ — > W"^^"^ are uniformly continuous. 

(2) a is uniformly bounded from below: \\a{x)v\\ > a\\v\\ for allx,v G and some q > 0. 

(3) r]k are i.i.d. W^-valued random variables, whose law possesses a density Qrj with respect 
to the Lebesgue measure on M™. 



Then assumption \2.9i holds. If and are strictly positive and assumption \2.]\ holds, then 

IItt^ — tt^IItv -—--^ P'^-a.s. for any iJ.,u,j. 

This result should be compared to the main results in ||3|, |l3|, 0] , where a very similar model 
is investigated. However, in these references total variation stability is proved only when the 
signal to noise ratio is sufficiently high. This is an artefact of the quantitative method of 
proof where two rates of expansion are compared: the filter is stable if one of the rates 'wins', 
which leads to a requirement on the signal to noise ratio. Our qualitative approach does 
not depend on the signal to noise ratio, however, so that evidently the assumptions required 



DISCRETE TIME NONLINEAR FILTERS WITH INFORMATIVE OBSERVATIONS ARE STABLE 5 



for the balancing of rates are stronger than is needed for filter stability (see [16| for further 
discussion). On the other hand, our approach can not provide an estimate of the rate of 
stability. 

Remark 2.6. Our results require that the observation state space is Euclidean F = M", as 
it relies on properties of convolutions in M" (an extension to the case where F is a locally 
compact abelian group may be feasible). In contrast, we have only assumed that the signal 
state space is Polish. Note, however, that assumption requires the existence of a uniformly 
continuous map : — > E such that h~^{h{x)) = x for all x £ E. In particular, h is 
an embedding of E into M", so that E can not be larger (e.g., of higher dimension) than M". 
This is to be expected, of course, as filter stability in the case where h is not invertible must 
depend on specific properties of the signal process such as observability or ergodicity. 



Remark 2.7. Theorems 2.2 and |2.4| provide stability of the filter whenever the initial measures 
satisfy ^^\cr{{Yk)k>o} ^ ^'^\cr{{Yk)k>o}' Absolute continuity of the initial measures fi <^ u is 
sufficient for this to hold, but is not always necessary. For example, if fiP^ <^ vP^ for some 
/c > and the observation density is strictly positive, then it is not difficult to prove that 
■^'^lo-{(yfe)fc>o} ^ ■^'^lo-{(Yfc)fc>o} ^s'^- In particular, if the signal possesses a strictly positive 
transition density, then ~ uP for every /i, u and we obtain stability for arbitrary initial 



measures provided > 0. This is the case, for example, in the setting of proposition 2.5 



Remark 2.8. Our results do not give a rate of stability, while most previous work on filter 
stability gives exponential convergence rates. The following simple example demonstrates 
that exponential stability can not be expected in the general setting of this paper. 

Let E = F = W and = Xfc + Cfc) where are i.i.d. iV(0, 1) and = Xq for all k. 
This setting certainly satisfies the requirements of theorem |2.2| . We choose = N{a^a'^) and 
V = N{P,a'^) for some a, (3, a € M (so P'^ ~ P''). Linear filtering theory shows that vr^ is a 
random Gaussian measure with mean Zj^ and variance given by 

l + a^{k + l) l + a^{k + l) /fc+l^ 1 + C72(A; + 1)' 

and similarly for tt'^, Zj^, where a is replaced by /?. Note that by the law of large numbers, 
the second term in the expression for (and Zjl) converges P'^-a.s. to Xq. But 



\K - <IIbl > 



cos(x) 7r^(dx) — / cos(x) 7rJ^((ix) 



e-^"/2|cos(^^)-cos(Z;:)|. 



Noting that cos(2';;;) = cos{Zii) - sm{Zii) (/3 - a) /{I + a'^{n + 1)) + o{n~''^), we find that 
liminfnIK -<||bl > limmfne-^" /^\cos{Zi^) - cos{Z:^)\ = |sin(Xo)| > P^^-a.s. 

n— >oo n— >oo (j 

By Fatou's lemma liminf„_^oo 'T'E^(||7i"n — vrJ:^||BL) > 0, so that evidently the stability rate of 
the filter is at best of order 0(n~^) and is certainly not exponential. 

It is interesting to note that by | p^ , p. 528, theorem 4] and by the equivalence of the 
Hellinger and total variation distances, Yl'^=o IKn-^~^ * — TTn_h~^ * ^||^v < °° P'^-a.s. 
The convergence of the expression ||7r^_/i~^ * ^ — 7r'^_h~^ * CIItv which appears in the proof 
of lemma 3.1 below is therefore generally, in a sense, not much worse than o{n~^^'^). It is 



unclear, however, whether this property survives the subsequent manipulations that lead to 
stability of the filter. 
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3. Proof of Theorem 2.2 



Let us begin by recalling a part of the proof of proposition 1.1 



Lemma 3.1. Suppose that the characteristic function of vanishes nowhere, and that more- 
over P^U{(y,).>o} « Pl-{{n.).>o}- Then \K^h-^ - ^./^-^bl ^ P^-a.s. 

Proof. Denote the law of as ^. It is easily verified that for any probability measure p 

where * denotes convolution. A classic result of Blackwell and Dubins ||2[ section 2] shows 
that 



<_/i"i*e-<_/i~'*eiiTv 



P'^-a.s., 



where || • ||tv is the total variation norm. The result now follows from [16, proposition 
C.2]. □ 



To proceed, recall that due to the Bayes formula (e.g., |^ section 3.2.2]) 

//(x)<75(y„-Mx))<_(dx) 



f(.x)K{dx) 



for all bounded f : E 



P^-a.s. 



fq^{Yn - h{x))'K'^_{dx) 
Note that the denominator of this expression is strictly positive P'^-a.s. Moreover, we have 



W{f{Yn)\Yo,...,Yn-i)= f{y)q^{y-h{x))TTP^_{dx)dy P^-a.s. 



for any bounded function / : ^ M as in the proof of lemma 3.1. Therefore, it follows from 
the disintegration of measures that P^-a.s. 

e^(IK-<iibl|>o,...,i;^-i) = 

J f{x)qc{y - f f{x)q^{y - h{x)) TT^^idx) 



sup 
/eLipo 



! idy -Kx))T^n~{dx) 



Jqdy - Kx))K-{dx) 



where it should be noted that by the assumption that P^|o-{(Yj.)j.>o} ^ '^^\(y{{yk)k>o} ™ theorem 
|2.2| (which we presume to be in force throughout) all quantities in this expression as P^-a.s. 
uniquely defined and both denominators are strictly positive P'^-a.s. 

It will be useful for what follows to rewrite the above expression in a more convenient form: 



e'^(IK-<IIbl|>o,..., 



sup 

/GLipo 



f{h-\x))q^{y-x)^^^_h~\dx) 



j f{h-\x))q^{y-x)K~h-\dx) 
J q^{y - x)Tr';[_h~^{dx) 

Here we have fixed a uniformly continuous function h^^ : M" 
for all X ^ E] the existence of this function is guaranteed by assumption 2.1 



t{y - x) ■K'^^_h ^{dx) 



dy. 



E such that h ^{h{x)) = x 
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Lemma 3.2. Let p^p' he two probability measures on R", and let Z,Z' be (not necessarily 
independent) -valued random variables such that Z p and Z' ^ p' . Then 



/ sup / f(h ^{x))q^{y - x) p{d: 



J f{h~\x))q^{y-x)p'(dx) 



f q^{y - x) p'{dx) 
<E{d{h-\Z),h-\Z'))A2)+2 J Bi\q^{y-Z)-q^iy-Z')\)dy 



dy 



where by convention 0/0 = 1. 

Proof. The left hand side of the expression in the statement of the lemma is 

( 7\\ Wih~\Z'))q^iy-Z')) ^^ 
sup E{f{h {Z))q^{y-Z)) — ^— E{q^{y - Z)) 

Now note that for any / G Lip and x,y G M" we have \f{x) — f{y)\ < d{x, y) A 2, so 



A := 



dy. 



I 



sup \E{f{h~\Z)) q^iy - Z)) - E{f{h~\Z')) q^{y - Z))\dy 

/GLipo 



< j E{{d{h-\Z),h~\Z')) h2}q^{y - Z))dy = E{d{h-\Z),h~\Z')) h2), 

where we have used the Fubini-Tonelli theorem to exchange the order of integration. Thus 
A < E(d(/i-i(Z), h-^{Z')) A 2) + 



sup 

/eLipo 



E{f{h~'{Z'))q^{y-Z)) 



E{f{h-\Z'))q^{y-Z')) 



E{q^{y-Z)) 



E{q^{y - Z>)) 

Estimating E{f {h~^ {Z')) q(^{y - Z)) by E{f {h~^ {Z')) qi.{y - Z')), we similarly obtain 
A < B{d{h~\Z), h'\Z')) A 2) + y E{\q^{y - Z) - q^{y - Z')\) dy + 



dy. 



sup 

/GLip,, 



E{f{h~\Z'))q^{y-Z')) 
We now substitute in this expression 



E{f{h~\Z'))q^{y-Z')) 
E{q^{y - Z>)) 



E{q^{y-Z)) 



dy. 



E{f{h~Hz'))qdy-z')) 



EU{h~\Z')) q,[y - Z')) = ^^^V-Z'))^ ' ' " 



and note that \B{f{h-^{Z')) q^{y - Z'))/'E{q^{y - Z'))\ < 1 whenever 
der of the proof is now immediate. 



oo < 1. The remain- 
□ 



A remarkable result due to Dudley ||lO| , theorem 11.7.1], which extends the classical Sko- 
rokhod representation theorem to the || • ||BL-uniformity, allows us to put this lemma to good 



use. 



Lemma 3.3. Let pn and p'^, n > Q be two sequences of probability measures on such that 
||/9„ — /OnllBL — > as n ^ oo. Then the following quantity 

j- f{h-\x))q^{y-x)p'M^) 



sup 

/GLip,) 



f{h ^{x))q^{y - x) pn{dx) 
converges to zero as n ^ oo. 



J q^{y - x) p'^{dx) 



dy 
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Proof. By [IC, theorem 11.7.1] we can construct two sequences of M"-valued random variables 
Zn and Z^, n > on some underlying probability space such that Zn ~ Pn and Z^ ~ p'^ for 
every n and ||Z„ — Zl^\\ ^0 a.s. as n ^ cxd. By the previous lemma, the expression in the 
statement of the present lemma is bounded by 

A„ < B{d{h~\Zr,), h-\Z',)) A2)+2j ^{\q^{y - Z„) - q^{y - Z'^)\) dy 

for every n. As is uniformly continuous and ||Z„ — Z^|| a.s. as n — > oo, we find that 
d{h~^(Zn),h~^{Z^)) a.s. as n — > oo. Thus the first term evidently converges to zero by 
dominated convergence. To deal with the second term, note that 

/ - Z.) - q,iy - Z'Jldy = I \q,iy + Z'^ - Z.) - q,iy)\dy = ||T,„„,, - ,,||,.(,,), 

where {Tzf){x) = f{x — z) denotes translation. But recall that translation is continuous in 
the L^-topology |[Tl|, proposition 8.5], so we find that 



\\Tz^~Z'^qi - (idLHdy) 



a.s. 



On the other hand, 

\\Tz,,~z'^qe, - <ldL^{dy) < 2 hiWh^idy) = 2 for all n. 
Dominated convergence gives 



/ 



^{\%{y - Zn) - - Z'n)\) dy = B{\\Tz„^Z^qi - q^L^dy)) 



0, 



where we have used the Fubini-Tonelli theorem to exchange the order of integration. □ 

The proof of theorem is now easily completed. Indeed, under our assumptions we obtain 
|[7r(^_/i~^ — 7r^_/i"^||BL P'^-a.s. as n ^ co by lemma 3J, so the previous lemma gives 



Bf'{\\K-<\\BL\Yo,...,Yn-l] 



P^-a.s. 



Taking the expectation with respect to P^, and noting that HvTn — tt^Hbl < 2 so that the 
dominated convergence theorem applies, yields the proof. 



4. Proof of Theorem 2.4 



We begin by showing that the one step predictor is stable in mean total variation. 



Lemma 4.1. Suppose that assumptions \2.^ and \2.^ hold. Then 



E^dlxL-vrMlTv) 
Proof. Recall that 







whenever 



P^U{{n-)fc>o} ^ P''U{(V'fc)fc>o}- 



for any bounded measurable function /. Therefore 



TV 



n-l 



A* II 



sup 

/GG 



where G = {Pf : f ^ Bq} (recall that Bq is a countable family of functions such that 
ll/i — z/||tv = ll/f^~^l|Bo) see section ^]^). We now claim that the family G is uniformly 
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bounded and uniformly equicontinuous. Indeed, it is immediate that ||/||oo ^ 1 for every 
/ G G, as this is the case for every f ^ Bq. To prove uniform equicontinuity, note that 

sup |/(x)-/(y)|< sup sup |P/(x)-P/(y)|= sup \\P{x,-) - P{y,-)hy.= wp{5) 

d{x,y)<5 d{x,y)<5 feBo d(x,y)<5 



for every f ^ G. By assumption 2^, we evidently have wp{5) — as (5 0. Uniform 
equicontinuity of G is therefore established. To complete the proof, it remains to show that 



^0 



whenever P''U{(y,),>o} < Pla{(y,),>o}- 



e^(IK-<||g)- 

This is established precisely as in the proof of theorem |2.2| , however: the only modification 

that must be made in the present setting is that the term E((i(/i~-'^(Z), A2) in lemma 

PI is replaced by ^{wp{d{h-^{Z),h-^{Z'))) ^2). □ 



We now proceed to show stability of the filter (rather than the one step predictor). Stability 
in the mean follows trivially from the previous lemma and the following estimate. 

Lemma 4.2. Whenever ^P^'\^{(y^'^^^^^^ < P''L{(yfc)fc>o}' ^"-""^ 

E'^dK - <IItv|>o, . . . , Yn-i) < 2 - <„||tv P'^-a.s. 



Proof. As in the proof of theorem 2.2, we can write 



E'^(IK-<||tv|>o,...,>;.-i)= / sup 

j f{x)q^{y-h{x))TT-^_{dx) 



dy. 



fqdy - Hx))K-{dx) 

It follows directly that we can estimate E'^dlvr^ — tt^HtvI^O; • • • j^n-i) < Ai + A2, where 



Ai 
A2 



sup 

/GBo 

sup 

f&Bo 



f{x)q^{y - h{x))TTl;_{dx) - / f{x)q^{y-h{x))7T';l_{dx) 



dy, 



f{x) q^{y - h{x)) TT'^_{dx) 

U{x)q^{y-h{x))^-^_{dx) 
fq^{y - /i(x)X„((ix) 



q^{y - h{x)) TT^^idx) 



dy 



< 



%{y - K^)) T^n-{dx) - I q£_{y - h{x))T:'^_{dx) 



dy. 



To estimate Ai, note that 



sup 

/eBo 



f{x)q£,{y -h{x))Ti'^_{dx) - I /(x)gg(y - /i(2;)X_((ix 



/eBo 



< sup / \f{x)\qi:{y - h{x)) 



■U{dx)< q^{y-h{x)) 



Therefore, the Fubini-Tonelli theorem gives 
Ai < 

A2 is estimated in the same fashion, and the proof is complete. 



J Qdy - Hx))dy^ K_ - = |K_ - 



TV- 



<„|(dx) 



□ 
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We have now shown that under assumptions 2.1 and 2.3, 

^"ilK - <IItv) whenever Pn.{(y.).>„} « Pl.{(y.).>o}- 

It remains to prove that in fact 

IK - <IItv P^-a.s. whenever P''\amh>o} < ^''lm)k>o}- 

Clearly it suffices to show that \\TTn — tt-^Htv is P^-a.s. convergent. 
To this end, let <C 7. Then (see [|15|, corollary 5.7]) 



vrc: — TT ' TV = J 77 Jr"^-a.s., 

where ^^l^^^ := a{Yo, . . . ,Yn}, := a{Yk : k > 0}, and Jf^^^^ := a{Xk : k > n}. When 
dji/d'y is bounded, the numerator converges P'^-a.s. (hence P'^-a.s.) by the martingale conver- 
gence theorem (see, e.g., theorem 2]) while the denominator converges to a P'^-a.s. strictly 
positive quantity ET((i^/d7(Xo)|yJ^„]) ^ W {d^i/ d-i{Xo)\S^\) > P'^-a.s. Evidently 



ll^n ~ ^nllTV "~*°°> P^-a.s. whenever /.i <C 7, \\dfi/dj\\oo < 00. 
Now set 7 = (/i + i^)/2, and note that Wdfi/djlloo < 2, ||tiz^/d7||oo < 2. Therefore 
IK-<IItv^^O P'^-a.s., IK-<I|tv^^^O P^-a.s. 

UP^\f^^^Yk)k>o} ^ ■^'^lo-{(>fe)fe>o} second statement holds also P'^-a.s. The proof of theorem 



2.4 is now easily completed by applying the triangle inequality. 



Remark 4.3. From the above expression, it can be read off that under assumptions 2.1 and 



2.3 



oj 



n^^VJj:^,^[ =E'^(/(Xo)|3-^) whenever ||/||oo < 00 



n>0 / 

for every v. With a little more work, one can show that similarly for every v 

r\^+'^^fn,ool] =E^f{Xo,...,X,)\3^l) whenever 

n>0 / 



E'^ yf{Xo, . . . ,Xk 
which implies that for every u 



00 < 00, 



n>0 

For the significance of this identity, we refer to |0] and the references therein. 



5. Proof of Proposition 
We begin by proving the following representation. 
Lemma 5.1. For fixed x,x' € E, we have 



2.5 
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Proof. Note that is invertible for every x as it is presumed to be lower bounded. Therefore 
Fix,-) has density p{x,z) = qrj{a{x)~^{z — b{x)}) /det{a{x)) with respect to the Lebesgue 
measure on 



i"^ for every x. This imphes that 

qr,{a{xr'{z - b{x)}) qr,{a{x'rHz - 6(x')}) 



\P{x,-)-P{x',-)\Wv 



det{a{x)) 

The result follows through a change of variables. 



det(c7(x')) 



dz. 



□ 



We now prove that assumption |2.3| holds in this setting. 
Lemma 5.2. ||P(x„, •) — P{yn, •) ||tv whenever d{xn, Vn) 



0. 



Proof. Fix any sequence Xn,yn such that d{xn,yn) 0. By the previous lemma, it evidently 
suffices to show that the following function converges to qrj{z) in L^(dz) as n — > oo: 

gn{z) = qr,{cr{xn)~^{(7{yn)z - b{xn) + b{yn)}) det(fj(x„)"V(y„)). 
Suppose first that is continuous. Note that 

||(T(x„)"V(y„) - /|| = \\a{xn)~^{cr{yn) - f^{xn)}\\ < a'~'^\\cr{yn) - f7{Xr, 



as a is uniformly continuous and lower bounded by a > 0, while ||(t(x„)~^{6(x„) — b{yn)}\\ < 
a~^\\b{xn) — b{yn)\\ — > as n ^ oo as 6 is uniformly continuous. Therefore, if g'^ is continuous, 
then gniz) converges to qrj{z) pointwise. By Scheffe's lemma Qn qri in L^{dz). 

Now suppose that is not continuous. Then there is for every e > a nonnegative 
continuous function with compact support such that \\qri — '7^11^1(^2) < £ proposition 
7.9]. Using the triangle inequality, we easily estimate 



\P{xn,-)- P{y nr)hy <2e + 



fr,{(^{Xn) ^{cr{yn)z -b{Xn) + b{yn)}) 



det(fT(?/„) ^a{xn)) 



dz. 



But we have already established that the second term on the right converges to zero as n 
and e > is arbitrary. This completes the proof. 



Filter stability now follows from theorem 2.4 whenever 



\^{{Yk)k>o} 



^{iYk)k>o}- 



OO, 

□ 
It 



remains to prove that when qr^,q^ > the absolute continuity requirement is in fact superflu- 
ous: 



^nllTV 



P'^-a.s. for any fi, v. 



Indeed, if this is the case, then by the triangle inequality 



<||tv < hi - <IItv + hi - <I|tv 



P'''-a.s. for any /i, z^, 7, 



which completes the proof of proposition |2.5| . 

To establish the claim, note that when > the transition kernel P{x,-) has a strictly 
positive density with respect to the Lebesgue measure for every x (z E. In particular, P{x, •) ~ 
P{z, •) for every x,z (z E. As > the filtering recursion is well defined under any initial 
measure, and it is immediately evident from the filtering recursion that 7r^_ has a strictly 
positive density with respect to the Lebesgue measure for every /i. In particular, this implies 
that 7r^_ ~ tt'(_ for every i^. But it is not difficult to establish that (see, e.g., the proof of 
[15, lemma 5.12]) 



E'' ( lim 



sup llvr^ 



^nllTV 



Yo = y) = E'^(^) ( lim sup h^^y^ - <(^) | 



TV 
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where and denote the regular conditional probabilities P^(Xi E -lYo = y) and 
P'^{Xi £ -{Yq = y) (i.e., /i(lo) = t^i- and ^{Yq) = 7r'(_). As jl{y) ~ D{y) for (almost) every 
y € M", we have P^^^^ |(T{(yfc)fc>o} ~ ■^''^^^ lo-{(Vfe)fc>o} and the claim follows from theorem \2.4\ . 

Remark 5.3. An almost identical argument shows that when qri,q^ > 0, absolute continuity 
of the observations P^U|(yfc)fc>o| -f"^lo-{(yfc)fe>()} holds for any pair of initial measures 



Together with theorem 2.4 this gives the desired claim. Note in particular that in this setting 
TTn is P'^-a.s. uniquely defined for any /u, v, so that the statement of the proposition ^]5| is in 
fact well posed (i.e., we do not need to be careful to choose a specific version of the filter). 
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